Stream Clipper: Scalable Submodular Maximization on Stream

نویسندگان

  • Tianyi Zhou
  • Jeff A. Bilmes
چکیده

Applying submodular maximization in the streaming setting is nontrivial because the commonly used greedy algorithm exceeds the fixed memory and computational limits typically needed during stream processing. We introduce a new algorithm, called stream clipper, that uses two thresholds to select elements either into a solution set S or an extra buffer B. The output is achieved by a greedy algorithm that starts from S and then, if needed, greedily adds elements from B. Swapping elements out of S may also be triggered lazily for further improvements, and elements may also be removed from B (and corresponding thresholds adjusted) in order to keep memory use bounded by a constant. Although the worst-case approximation factor does not outperform the previous worst-case of 1/2, stream clipper can perform better than 1/2 depending on the order of the elements in the stream. We develop the idea of an “order complexity” to characterize orders on which an approximation factor of 1 − α can be achieved. In news and video summarization tasks, stream clipper significantly outperforms other streaming methods. It shows similar performance to the greedy algorithm but with less computation and memory costs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Budgeted stream-based active learning via adaptive submodular maximization

Active learning enables us to reduce the annotation cost by adaptively selecting unlabeled instances to be labeled. For pool-based active learning, several effective methods with theoretical guarantees have been developed through maximizing some utility function satisfying adaptive submodularity. In contrast, there have been few methods for stream-based active learning based on adaptive submodu...

متن کامل

Do Less, Get More: Streaming Submodular Maximization with Subsampling

In this paper, we develop the first one-pass streaming algorithm for submodular maximization that does not evaluate the entire stream even once. By carefully subsampling each element of data stream, our algorithm enjoys the tightest approximation guarantees in various settings while having the smallest memory footprint and requiring the lowest number of function evaluations. More specifically, ...

متن کامل

Submodular Maximization over Sliding Windows

In this paper we study the extraction of representative elements in the data stream model in the form of submodular maximization. Different from the previous work on streaming submodular maximization, we are interested only in the recent data, and study the maximization problem over sliding windows. We provide a general reduction from the sliding window model to the standard streaming model, an...

متن کامل

Budgeted Nonparametric Learning from Data Streams

We consider the problem of extracting informative exemplars from a data stream. Examples of this problem include exemplarbased clustering and nonparametric inference such as Gaussian process regression on massive data sets. We show that these problems require maximization of a submodular function that captures the informativeness of a set of exemplars, over a data stream. We develop an efficien...

متن کامل

Horizontally Scalable Submodular Maximization

A variety of large-scale machine learning problems can be cast as instances of constrained submodular maximization. Existing approaches for distributed submodular maximization have a critical drawback: The capacity – number of instances that can fit in memory – must grow with the data set size. In practice, while one can provision many machines, the capacity of each machine is limited by physic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1606.00389  شماره 

صفحات  -

تاریخ انتشار 2016